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Coding signals 



This invention relates to method of coding signals and to apparatus for storing, 
transmitting, receiving or reproducing signals. 



5 A common method of storing audio signals is to we parametric coding to 

represent audio signals, especially at very low bit rates, typically in the region from 6 kbps to 
90 kbps. Exan^lesoftfaeuseofparanwtriccodingusedinthis way areincli^ 
rate hig^ qualitjr audio coding with combined harmonic and wavelet representation" in 
Ptoceedings of the IEEE International Conference on Acoustics^ Speech and Signal 

10 Processing, Volume 2, pp 1045 to 1048, 1996; ''Advances in Parametric Audio Coding" in 
Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and 
Acoustics, pp W99-l-W99<4, 1999; and ''A 6 kbps to 85 kbps scalable audio codei^ in 
Proceedings of the IEEE International Conference on Acoustics^ Speedi and Signal 
Processing, Volume pp 877-880, 2000. In these examples, a parametric audio coder is 

IS described, inwhichanaudiosignalisrepresentedby a xnodel, with parameters of ^ model 
being estimated and encoded. These examples use a parametric representation of an audio 
signal based on decomposition of an original signal into three components: a transient 
component, a tonal (sinusoidal) component, and a noise component. Each component is 
represented by a corresponding set of parameters, as described in the three documents above. 

20 A transient component of an audio signal can be characterized as an isolated element of the 
audio signal which is relatively short lived, and is represented by a sharp increase in energy 
of the audio signal 

It has been found that having a dedicated model for the transient component of 
an audio signal proves to be beneficial for parts of audio signals with sharp attadcs, because 
25 sinusoidal and noise models cannot easily represent such perceptually important events and 
poor modeling can result in audible artifacts such as a pre-echo. A pxe-echo occurs ^en the 
modeling error distributes the transient vent to tiie samples before fbe transient beginning 
and when the resulted distortion is large enou^ to become audible. The distribution of ibid 
modeling error to ifae samples before the transient beginning results fiom the segment^by*- 
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segment analysis of an inpttf signal in an audio coder. If a transient occurs in the middle of 
an analysis segment, then either a lot of coding resources are required in order to accurately 
model the transient, or the modeling error distributes to the whole analysis segment. 
Modeling error of the samples preceding a transient is typically perceptually more apparent 
5 than at samples after the transioit, because of a weaker masking &om the transient event 
itself. 

In "R^dual modeling in music analysis-synthesis" from Proceedings of the 
IEEE International Conference on Acoustics, Speech and Signal Processing, Volume 2, pp 
1 005-1008, 1996 it is shown Aat transient components caxmot satisfectorily be represented 
10 by sinusoidal and noise models alone. 

It has been shown previotisly in '^Robust eaqwnential modeling of audio 
signals'' from Proceedings of the IEEE International Conference on Acoustics, Speech and 
Signal Processing, Volume 6, pp 3581-3584, 1998, that transients can be modeled efficiently 
using sinusoids wifli exponentially modulated amplitudes (referred to below as damped 

IS smusoids). In the text below damping coefficients can be any real number, and positive 
values correspond to increasing amplitudes raOier than to truly decreasing an4)litude$. In 
^'Robust exponential modeling of audio signals*' (see above) an audio signal was analyzed on 
a segment-by-segment basis and each segment was represented as a sum of damped 
sinusoids. A problem arises wth this type of coding when a transient starts in the middle of a 

20 givm segment Compared to the case where transient starts in the beginning of a segment, 
^ number of damped sinusoids needed to model the transient well increases considerably. 
If a transient is not modeled prop^ly, the modeling etror is distributed over the whole of a 
given segment resulting in audible pre-echoes. 

In flic MPEO-1 Layer HI audio coding algorithm, as described in "ISO- 

25 MPEG-1 Audio: a generic standard for coding of high-quality digital audio*' in the Journal of 
^e Audio Engineering Society, Volume 42, pp 780-792, October 1994. The segmentation is 
defined simply by the lengths of the long and short windows. 



30 It is an object of the present invention to address the above mentioned 

disadvantages. To this end the invention provides a method of coding and an apparatus for 
coding as defined in the indep^ident claims. Advantageous embodiments are defined in flie 
d^endent claims. 
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According to a first aspect of the present invention the coding of an input 
signal comprises: 

- estimating a location of at least one transients in a time segment of the ix^ut signal; 

5 * modii^g the location of the transient so that the or each transient occurs at a specified 
location on a predetermined time scale to obtain amodified signal; and 

- modeling the modified signal. 

The use of restricted time segmentation in ttie form of a specified location on a 
predetermined time scale to provide tiie only locations for &e transients advantageously 
10 reduces the number of bits needed to describe the segmentation. Also the modificodon 

procedure has lower computational cost compared to a fiill precision segmentation procedure. 

Bach transient is preferably re*located to a nearest specified location of a 
plurality of possible locations on the predetermined time scale. 

The spedfied locations on the predetermined time scale may be defined by 
IS integer midtiples of a predetermined rniniimmi time segment size. The piedetermined 

TniniTTiTmi time segment size may have a length in the nudge of approximately 1 millisecond 
(ms) to approximately 9 ms, most preferably in the range of approximately 4 ms to 
approximately 6 ms. 

The use of a restricted time segmentation as described advant^eously 
20 simplifies the modeling procedure significantly, if rate-distortion control is used to distribute 
codix^ resoxirces between transient, sinusoidal and noise components of the input signal 
being modeled 

The modeling preferably uses damped sinusoids. 

The audio signal is preferably sampled at a rate of approximately 5 to 50 kHz, 
25 most preferably 8, 16, 32, 44,1 or 48 kHz. The video signal is preferably sampled at a rale 
of approximately 5 to 20 MHz. 

The restricted ti m^ segm^itation may also be applied to tonal and/or noise 
components of an input signal. 

The estimation of the location of transients may be carried out using an 
30 energy-based approach, preferably with a moving window method, most preferably using 
two sliding windows. 

The use of an energy-based approach allows the advantageous estimation of 
both very $hort transients and longer transients. 
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The location of transients may involve the location of a beginning and an end 
of each transient 

Preferably each located transient is naoved by a owt and paste method ftom its 
original location to begin at a location on the prcdctcnnined time scale. 
5 The cut and paste mediod simply removes that part of the input signal 

identified as a transient and moves it to the new location. Thus the step is very simple to 
inq>I»nent 

A remaining section of the input signal between two located and modified 
transients is preferably time-warped to fill the gap remaining following the relocation- The 
10 time-waip may be a lengthening or a shortening of said remaining section. 

By using knowledge of sound perception, inchiding pitch perception and 
temporal masking effects, the time-warping is a single method with ^ch to restore the 
remaining signal after modification of the transients. 

The time-warping preferably preserves 4e amplitudes of edge-poinls of &e 
15 modified signal, preferably by a band limited interpolation method. 

The tun&-warp is preferably carried out by interpolation v«dicre the change in 
Ibe flmdamenial fiequencsy,;^, of the remaining section is less than approximately 0.3%» most 
preferably less than approximately 0.2%. 

Otherwise, the remaining section is preferably split in to a first length 
20 immediately after Ifae modified transient and a second length. Preferably, flie first lenglh is 
approxiinatelyginsto 12ins,mostprefisrablyi^>proximately lOms. The first length is 
preferably interpolated if the change of fundamental fiequency caused is no more than 
qyproximately 1.6% to 2.4%, most preferably no more than approximately 2%, For the 
second lengOi, the diange of fundamental frequency is preferably not more than about 0.16% 
25 to 0.24%, most pref^bly approximately 0.2%. 

Where the interpolation is insufficient to fill a ssp in the remaining section an 
overiap-add procedure is preferably used. 

The modification of the location of the or each transient may be performed 
using a transformation into a frequeiwgr domain, preferably witii a discrete cosine transform. 
30 The resulting sinusoidal representation may then be analyzed for transient locations using a 
Hanning window. Preferably, flie Hanning window has a length of approximately 512 
samples (where a sample has a length of one divided by a sampling frequency of the input 
signal), preferably with an overUp between Hanning windows of 256 sanies. 
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The input signal is preferably processed by dividing the input signal into a 
plurality of time segments. The time segments may have a length in the range of 
approximately 0.5 s to 2 s, preferably a length of approximately 1 s. 

Adjacent time segm^its are preferably arranged to overlap, preferably by 
5 2g)proximately 5% to approximately 15% of their length, more preferably the overlap is 
approximately 10% of the time segment length, v/bdch overly may be approximately 0.1 s. 
Where a transit is located in an overlap of the adjacent time segments, the transient 
location is modified in the time segment in which the transient i js most centrally located. 

The provision of an overlap in adjacent time segments advantageously allows 
10 tiie selection of the time segment in which the transient is most centrally located, or more 
importantly furthest from the beginning or end of the time segment. 

The invention extends to decoding audio or video signals coded according to 
the coding of the first aspect 

An apparatus according to an embodiment of the invention may be an audio 
1 5 device, e.g. a solid state audio device. 

All of the features disclosed herein can be combined with any of the above 
aspects, in any combination. 

Preferred embodiments of the invention of the invention provide coding 
signals which coding has a more simplified analysis procedure than has previously been 
20 described, coding signals which coding has a lower computational cost tiian equivalent 

methods^ aTiA cocUng signals which coding results in a reduction of the number of bits needed 
to describe a segmented signal. 



25 Specific embodiments of the present invention will now be described, by way 

of exanspl^ and with reference to tiie accompanying drawings, in which: 

Figure 1 shows the performance of a damped sinusoidal model in the case of a 
restricted segmentation of an audio signal for an original and a time shifted transient for a 
first embodiment; 

30 Figure 2 shows an original transient and its reconstruction with 25 damped 

sinusoids; 

Figure 3 shows a time shifted transient and its reconstruction with 25 damped 
sinusoids for the first embodimrat; 



6 03.05^001 
Figure 4 is a flow diagram of the steps involved in tbe method of coding audio 
signals in the fiist embodiment; 

Figure S is a diagrammatic illustration of the modification of transient location 
in a second embodiment; 

5 Figure 6 is a diagrammatic illustration similar to ftat of Figure 5; 

Figure 7 shows an original transient and its reconstruction; 
Figure 8 ^ows a shifted transient and its reconstruction according to the 
second embodiment; 

Figure 9 is a flow diagram of ifae steps involved in the second embodiment; 

10 and 

Figure 10 is a schematic diagram of an au(Co encoder and an audio decoder 
utilizing the methods described herein. 



15 The first method disclosed herein, and as shown in Figure 4» uses a restricted 

time segmentation^ in which segments of an audio signal are defined by integer multiples of a 
predefined minimum segment size, which in the example used is 5 ms, but of course this 
could vary. In view of the restricted time segmentation the transient component of the audio 
signal is modified such that transients can start only at the beginning of a segment The 

20 modified signal is then modeled, in Ais example by using damped sinusoids. This results in 
an efficient representation of transients witii damped sinusoids. 

The coding of audio involves a first step of modilying the location of transient 
elements of the signal so that the transients can occur only at locations defined by a relatively 
coarse time grid, as described bebw in the discussion of e^q^erimental results. In order to 

25 modify the locations of transients in tiie audio signal tbe following steps are taken: 

1 . The transient component of an origmal audio signal is estimated and is subtracted from 
the original audio signal to form aresidual signal. 

2. Thelocationsof tiie estimated transients are then modified in such a way that &e 
transients can only occur at locations specified on a grid. 

30 During ih& transient estimation and modification, it has been verified ^t 

when the modified transient signal is added to the residual signal obtaiiied in step 1 above, 
there is no perceptual difference between the obtained signal and the original audio signal* 

In order to modify the transient locations it is necessary to estimate the 
transit component of the original audio signal to be coded. It is possible to use dififereot 
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transicDt models in paxametric coding of audio. One example v/iiich has been used is die 
transietxt model based on duality between &e time and fiequency domain piiesented in 
'^IVansient modeling synthesis: a flexible analyaia/syndiesis tool for transient signals", in 
Pzoccedings of the International Computer Music Conference, pp 25-30, 1997. 
5 In more detail, the transient estimation model presented in the above reference 

is based on the duality between the tiioe and the frequency domain. A delta impulse in the 
time domain corresponds to a sinusoid in the fiequency domain. Furthermore, a sharp 
transient in the time domain corresponds to a frequency domain signal which can be 
represented efficiently by a sum of sinusoids. Moxe specifically, the transi^ts are estimated 
1 0 using the following steps. 

1. A discrete cosine transfbrm (DC!) is u^ed to transform a time domain segment to the 
frequency domain. The segment size (equivalently, the DOT size) should be sufficiently 
large to ensure that a transient is a short event in time (thus, transformed to the fiequency 
domain, it can be modeled efficiently by sinusoids). A block size of about 1 s has been 

IS found to be suf&cieDt. 

2. The frequency dornain^Cr domain) signal is analysed widi a ain^ One 
example which has been used is a consistent iterative sinusoidal analysls/syntiiesis with 
Hanning-windowed sinusoids, as described in "High quality consistent analysis-synthesis 
in sinusoidal coding'% from Proceedings of the Audio Engineering Society 17^ 

20 Conference "Ugh quality audio coding^ pp 244-2d0> 1999. 

The sinusoidal analysis of a DCT domain segment is done on a segment by 
segment basis. As a result^ the DCT-domain segment is represented as 

7 = 0,— ,Zr-l,i = l,...,/ 

vAxexG i is the length of the sinusoidal segments (the shift between sinusoidal segments is 
25 1/2). The length of the sinusoidal segments, Z, is a small fraction of the DCT size, h(l) 
are samples of the Hanning window, and {Atj, o>ij, 0iji} are ampUtudcs, fi^quencies and 
phases of the estimated sinusoids respectively. The index i denotes a particular sinusoidal 
s^ment within the DCT-domain segment, while the index J denotes a particular sinusoid 
within the sinusoidal segmoit. The information about the location of a transient in a time 
30 domain segment is contained in the frequency parameters of the correspo nding sinusoids. A 
transimt in the beginning of a segment results in low sinusoidal frequencies, while a transient 
in tiie end of the segment results in high sinusoidal firequencies. The frequency resolution of 
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the sinusoidal model depends on the required resolution in estimation of transient locations. 
If the required time resolution is one sample then the inquired fiequency resolution is defined 
by the reciprocal of tiie DCT size. 

Due to the duality between die transient location in a time domain segment 
5 and the ftequendes of the corresponding sinusoids, the obvious way to modify the transient 
location is to modify the conesponding frequencies (plus a eoirection in the phase 
parameters). The transient location in die dme domain segment is denoted by no and die 
closest allowed location &om a time grid is denoted by n . Then the desired time shift is 
defined as 

0 

An = 7zo-« (2) 



In order to modify the transient location by An the firequencies <Oij and phases 
0\j corresponding to the transient should be modified as follows: 



15 &t.j = €»,j 



N (3) 



No modificadon of amplitudes A^j is needed. 

20 Note that ibc above procedure is different from ixidep^deut quantization of 

sinusoidal parameters. All frequencies corresponding to one transient are modified by ftxc 
same amount. This, togettier with the phase correction of equation (4) above, ensures that the 
shape of the time dom3in transient is preserved, only the location is modified. 

Because the DCT size is relatively large at one second, more than one 

25 transient can occur in a time domain segment. In this case« &e model has to identify 

sinusoidal parameters corresponding to different transients* This is done by declanng close 
sinusoidal frequencies coy to represent the same transient. Specifically, two sinusoids having 
frequencies differing by not more than are declared to represent ih& same transient and two 
sinusoids having frequencies differing by more than Sm are declared to represent different 

30 transients. Then locations of all transients are modified separately. Below when reference is 
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made to a groi^ f&equenides a>j|/refer^2ce is being xnad^ 
particular transient. 

A transient can occur at the beginning or at the end of a time domain segment 
In this case, the modification of sinusoidal frequencies can yield frequencies below 0 or 
5 above ^ This results in tiie distortion ofthe shape oftiie time domain transient. Toaccoxmt 
for this, an overlap is allowed between time domain segments (0.1 seconds). In Uns case a 
transient can s^pear in two overlapping segments, i.e. in the region of mutual overlap. 
Because the overly is sufficiently large, if the transient is located very close to a border of 
one of the overlapping segments^ then it is located at a safe distance from a border of the 

1 0 other segment It is straightforward to identify the transi^t location from sinusoidal 

frequencies, and therefore it is easy knowing tfac estimated sinusoidal frequencies in the two 
overl^)ping segments to id^tify when a transient is represented in two segments. If such a 
situation occurs^ the corresi)onding sinusoids in the segment are cancelled where the 
transient is closer to the corresponding border. 

IS A typicaltiansient lasts for more than one time sample* A natural question is 

then what is the location ofno of the transient After the modification of location the 
corzesponding sample of the transient will be placed at location n corresponding to the 
beginning of a segment defined by ihe time grid. Therefore, it is important that the estimated 
value no corresponds to the start of the transient The time domain approach described below 

20 has proved to yield good results. First, the time samples «mia and Wmw are identified 

corresponding to the frequency values min(<aij) and max(coy), where are frequencies of 
sinusoids corresponding to a particular transient Next, the highest amplitude of the 
estimated transient signal in the time interval [/imin^/imax] is fovud. Then, the start sample of 
the transient no is defined to be the first sample in the interval [/imtib rimaxi having amplitude 

25 higher than 10% of the highest amplitude. 

Typically, the estimated transient con^)onent of an audio signal contains 
samples of small amplitudes before tiie sample no. Because the time sample no is declared to 
be the first sample of the transient and that no transient can occur at a distance defined by £<a 
before the transient, the corre^nding samples before no are forced to have zero amplitude. 

30 As a result, flmse catviplfeg go to the residual signal with their original amplitudes. 

Having estimated flie locafion of transients and modifying their location as 
described above the modified signal can now b modeled to allow die signal to be coded. 
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A damped sinusoidal model is vised to model the modified signal, which was 
aX approximating a signal s with a sum of sinusoids mih exponentially modulated amplitudes. 



1.C, 

2M 



A/ (5) 

5 wfacreri»p„€ C i: e iV is the segment length. Equation S expresses i^/i^ as the sum of Jk/ 
damped (complex) exponentials. The parameter determines the initial phase and 
amplitude while deteriniiies the frequency and damping. In order to determine the 
paiameters r^^ andp„ for the ilf e}q)onratials the matching pursuit algorithm was used, as 
described in "Matcfamg pursuits with time- fiequeijcy dictionaries". tf^F Transactions of 
10 Signal Processmg. Volume 41. pp 3397-3415. December 1993. Matching pursuit 
approximates a signal by a finite expansion into elements chosen from a redundant 
dictionary. Let Z> = (g^ be a con^lete dictionarjr of unit-norm elements- The matching 

pursuit algorithm is a greedy iterative algorithm which projects a signal s onto the dictionary 
element gy that best matches the signal and subtracts this projection to form a residual signal 
15 to be approximated in d>e next iteration. Finding the best matching dictionary element 

consists of computing the iim» products < gr) and selecting the element that maximises the 
inner product. In order to find tiie parameters r,, and pm a dictionaiy is constructed consisting 
of danq^ed exponentials, 

g^^^ = ce'^e^'^.n = 0,..., AT ^1 (6) 

20 Where the constant c is introduced for having unit^norm dictionary elements, 

and compute the mner products of the residual signal at iteration Sm and the dictionaiy 
elements defined in equation 6; 

XT— 1 

(^»»S^..r> = ^'S^-.(»>"'«'"". (7) 

By doing this for different values of a, the transfer function Sm(z) is evaluated 
25 on circles in the complex z-plane having radius 

The method described above has been experimentally tested and tiie following 
gives results and discussion of computer simulations and informal listening tests performed 
on audio sigrials. The audio excerpts used were a castanet signal, songs by ABBA, Celine 
Dion, Metallica and a vocal by Suzanne Vega. The signals were sampled at 44. 1 kHz. The 
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DCT size is 44288 samples (approximately 1 second) and the overlap between time domain 
segments is 4410 samples (0.1 seconds). The sinusoidal analysis of the DCT domain signals 
is done using Harming windows of leng& 512 samples and mutual overlap of 256 samples. 
The transient component of the signal was estimated and subtracted to form fte residual 
S signal. Next> the trax^ent locations were modified according to a time grid of 220 samples 
(approximately 5 ms). 

It is important to veriiy that the modification of the transient locations does not 
introduce any audible distortion. To check that»flie modified transirat signal was added to 
&e residual signal. The listening tests conducted verified that there is no perceptual 

10 difference between the thus obtained signal and fhe original audio signal. 

In the following, the improvement due to the modification procedure will be 
illustrated. Also discussed is fixe i>erformance of a damped sinusoidal model with ibis 
restricted segmentation for an original transient signal (i.e. generally a transient starts at an 
arbitrary location) and the modified transient signal (a transient starts in 'die begiiming of a 

IS segment). The optimal restricted time segmentation (with the mininium segment size of 220 
samples) for damped sinusoids is found using the technique proposed in "Tlexible tree- 
structured signal expansions iisi'tig time-varying wavelet packets^ in I^BE Transactions of 
Signal Processing, Voltmie 45, pp 333-345, February 1997. The performance is studied in 
terms of signal-to-noise ratio (SNR) versus number of damped sinusoids NDS and is well 

20 illustrated by Figure 1 where results are presented for a particular transient of the Castanet 
signal; A represents the original transient and B represents the shifted transient. The 
modification procedure results in a considerably smaller number of damped sinusoids needed 
to represent the transient witii a certain quali^ than would previously have been the case. 
Lower plots of Figures 2 and 3, show the reconstruction with 25 damped sinusoids of the 

25 original and the modified transientSt respectively. In these Figures t[ms] denotes time in 
miUi-seconds. The original transient is not located in the begiiming of the segment and» as a 
result, the modeling error is distributed to samples before the transient This results in an 
audible pte-echo. On the otbec hand, the modified transient is located in the begjrming of the 
segment and^ as a result, the pre-ccho problem is el im i nate d. 

30 Figure 4 shows a flow diagram of the first embodiment having steps SI to S6, 

where; 

SI represents: Estimate the location of transients in a first time segment of an input signal, by 
a transformation into ^die fiequency domain* 
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52 represents; Modify the location of the transients in the spatial domam by modifying the 
corresponding fircquencies, to locations on a predete rmined time scale. 

53 represents; Estimate the location of transients in second and subsequent time segments of 
the transient signal, by a transformation into tiie frequency domain. 

54 represents: Modify the location of the transients in tiie spatial domain by modifying the 
corresponding frequencies, to locations on a predetermined time scale. 

55 represents: Decompose an audio signal into transient, tonal and noise components. 

56 represents: Recombine the decomposed signal for transmission or playback. 
It may be possible that a similar improvconent to that mentioned above ^uld 

be achieved in the case of a full- precision variable segmentation (and no signal 
modification). Howev«, the restricted segmentation and the modification procedure result in 
a much lower total computational cost Also, less side information is required to describe the 
restricted segmratadon. 

A second embodiment of coding method involves a different method of 
1 5 estimating the location of transients in an input signal and a different modification procedure. 
ITie locations of transients are modified in such a way that a transient can only occur at the 
beginning of a sinusoidal segmrat, which sinmoidal segments are defined by a specified 
segment size, which may be 5 milliseconds (ms); this is referred to as a restricted 
segmentation, and corresponds to that of the first embodiment. The reference to a beginning 
20 of a sinusoidal segment can be taken to be a reference to a beginning of a time grid in the first 
embodiment: the reference to a sinusoid simply refers to the modeling procedure xased. 

This second embodiment uses the same idea as the first embodiment in that 
transient locations arc modified to improve the modeling of signals, in particular, audio 
signals. However, this second embodiment provides an improved mefliod of modifmg the 
25 location of transients. 

To summarize the first method, the input signal was modified by estimating 
the location of transient components using a model based on tiie duality between ^e time and 
frequency domain for flie signal; subtracting tiie transient component; modi^dng the 
locations of transients such that their beginnings can only occur at the beginnings of 
30 sinusoidal segmoits and a restricted segmentation; and adding the modified transient to the 
residual signal in order to obtain a modified audio signal. 

In outline, ttie method of tiie second embodiment involves detecting the 
beginnings and ends of transient and audio signal using an energy based appxoafib with two 
sliding rectangular windows, as described in ^Audio subband coding with improved 
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representation of transient signal segments*^, from proceedings of EUSIPCO, pages 2345- 
2348, Greece 1998, incorporated herein by reference; followed by moving the identified 
transients to locations specified by a chosen time grid or sinusoidal segmentation grid; and 
time-waiping parts of the signal between the identified transients in order to fill the int^als 
between the modified transients. 

The transient detection approach as described in ^ Audio subband coding with 
improved representation of transient signal segments*' mentioned above, is based on the 
evaluation of the criterion function. C(7i) : 



wiiere 77 is a time sample, (n) and (n) are the energies of the input signal within 
15 length-^ rectangular windows on the left- and right-hand side of the time sample n. 

Significant peaks of the criterion fimction C(n) conespond to the begiimings of transients. 
Theeridof a transient is defined by searching the fiist value of C(n) after the begirming of ^ 

transient* v^di is just below a certain threshold. 

Once the beginnings and ends of fiie transients have been located using the 
20 above xnedtod the transients are simply removed from fiie signal and relocated to the nearest 

location on tiie specified sinusoidal segmentation grid, effectively by a cut and paste method. 

This part of the procedure is particularly straightforward and is easily impl«nented by tiie 

person skilled in the art 

As would be appreciated, due to the modification of ttie transient locations, the 
25 distance between two consecutive transients in an audio signal can become longer (e .g. if one 

is shifted forward and the oflxer is shifted backward), or the distance can become shorter (e,g. 

if a first transient is shifted backwards and a second transient is shifted forwards in time) , In 

figure 5 examples of transient modification where the distance is increased is shown, whereas 

in Figure 6, a reduced distance between transients is shown. In order to fill the interval 
30 betweenthexnodifiedtransients the signal part in between miist be znodifiedi^ 

all w for the greater or smaller distanc between transients. 
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The signal is modified by time-waiping, this is done in such a way that 
preserves the conect ampUtudes of the edge points of the signal in between the transients, 
thus there are no discontinuities introduced just befoxe or just after a transient, as described 
below. The time-waiping results in the signal between transients being stretched (as shovm 
5 in Figure 5) or compressed (as shown in Figure 6). To compute the ampUtudes at the new 
integer sampling positions based on the known aiiq>Utudes of the original samples, a band 
limited interpolation method based on sine functions is used (the bandlimited interpolation is 
described in Proalds and MaoolaWs 'TOgital Signal Processing. Piindples, ^ 
AppUcations", Ptentjce-Hall Jntemational, 1996). Modified Hanning window is used. To 
10 compute the ampUtude of each new sample, ampUtudes of ei^t original samples are used, 
four at each side of die new san^e. 

The stretching or compressing of a signal results for tonal signals in a 
cciresponding change of the fundamental ficquency,/o. The goal of the modification 
procedure is to ensure tiiat the induced modifications of ^ are not audible. 

I» Older to achieve the modification, the foUowing algorithm is used for time- 
waiping tiic part of the agnal between the two identified and modified transients; 

(a) if tiie required change in length of a signal part in between two transients results in a 
change of^by no more than 0.2%, the signal is simply subjected to a band limited 
interpolation method based on sine functions. This is the example shown in Figures 5a 

20 and 6a. If^ changes by more than 0.2% then foUow step b) as described below. 

The reason for the limit of 0.2% is that it has been determined from the 
Uterature on psycho-acoustics that changing/c of a tonal sound by 0.2% can be audible, as 
described hi "An hitroduction to the psychology of hearing". Academic Press, 1997. Our 
25 own experiments verify this result. 

(b) The signal part is spUt in between two transioxte into two non-ov^Itqifping intervals; flie 
first interval is located directly after the end ofthe first transient and lasts 10 ms (as 

iUustraiBd by interval 1 in figures 5b and 6b), and the second interval is the remaining 
part, i.e. it lasts until tiie beginning of the second transient (as shown by interval 2 in 
30 figures 5b and 6b). The lengths of the two int^als are modified by a difiecent amount. 
If the required change in length of tiie signal part in betwem two transients can be done 
by changuig^i in tiie first interval by no more titan 2% and in the second interval by no 
more than 0.2%. then the signal ui the two intervals is time-waiped correspondingly as 
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shown in the lower parts of figures Sb and 6b. Otherwise go to step c) as described 
below. 

The reasoning behind step b) is that the interval directly after the end of a 
transient is the interval where the masking effect from the transient is strong. Therefore^ 
larger changes of the signal in this interval are possible before tiiey become audible. Our 
exp eriments verify tbat a change of >g by no more than 2% in the interval 1 0 ms directly after 
the end of a transient is inaudible. 

(c) tinie-waip the signal in the two intervals such that the resulting is no more 

fbasx 2 % in the interval 1 and no more than 0.2 % in the interval 2. If the resulting change 
in length is not sufficient to fill the distance between tiie shifted transients tiien apply an 
ov^lap^uld procedure with a modified Hanning window using samples fiom the two 
intervals in order to increase or decrease the length of the signal. To ensure a smooth 
transition between two intervals^ the length of the overlap-add region is chosen to be 
larger than required to obtain a correct length of the signal in between two transients 
(figures 5c and 6c). 

In figures 5 and 6 the new locations of transient begiimings are depicted with 
small arrows. In figure S the signal part in between two transients becomes longer. In figure 
6 the signal part in between two transients becomes shorter. In the lower part of figure 6c a 
small vertical shift is shown for clarity's sake. 

Various computer simulations of the method of the second embodiment^ 
together with informal listening tests with audio signals wm cairied out. The audio excerpts 
used were castanets, bass, trumpet, Celine Dion, Metallica, harpsichord, Eddie Rabbit, 
StravioslQr and OrSl The signals were sampled at 44.1 kHz. The transient locations were 
modified according to a tirne grid of 220 samples (approxirnatelySms). It is important to 
verify tiiat tiie modification of transient locations does not introduce any audible distortion. 
The listening tests conducted verified that there is no perceptual difference between the 
original and modified audio signals. 

Next, it was demonstratedtfaat there is an inq;irovement in the modeling of the 
signal due to the modification procedure. A comparison was made between the performance 
of a damped sinusoidal model with ^ restricted segmftntation for an original transient signal 
(i.e, generally transient starts at an arbitrary location) and for a modified transient signal (a 
transient starts at die beginning of a segment, as defined by the present mediod). Tlie lower 
parts of figures 7 and 8 show the reconstruction with 25 damped sinusoids of the origiiud and 
the modified transients, respectively. The original transient is not located at ibe beginning of 
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the segment, and as a result, the modeling error is distributed to samples before the transient 
This results in an audible pre^ho. shown by the ampUtude of the signal and the lower part 
of Figure 7 between 5 ms and approximately 7.5 ms, which is not shown in the upper part of 
the Figme 7 that shows the original transient On the other hand, the modified transient is 
5 located at the beginning of the segment and, as a result, the pre-echo is eliminated as 

demonslxated in Figure 8 in that the amplitude of the signal for upper and lower parts of the 
figure moves fiom zero immediately after 5 mjs, i.e. both at the same time. 

Figure 9 shows a flow diagram of the second embodiment having steps Tl to 

10 Tl represents: Estimate the location of transients Cbeginning and end) in a first time segment 
of an input signal, by an energy based approach. 
T2 represents: Modify the location of flie transients by cutting and pasting to locations on a 

predetennlned time scale, and timewarp the signal parts in between. 
T3 represfficits: Estimate the location of transients (beginning and end) in second and 
IS subsequent time segments of tiie input signal. 

T4 rqsresOTts : Modify the location of the transients as above, and timewarp tiie signal parts in 
betwcen- 

T5 represent?: Decompose Ibe audio isignal into trarasient, tonal and noise components. 
T6 represents: K^combine the decomposed signal for transmission or playback. 

20 The method described in the second embodiment provides a more general 

procedure and provides good results, which are an improvement on those of the first 
embodiment. The time-waiping principal is based on the knowledge of sound perception and 
the procedure of the second embodiment is less complex to implement and utilize. 

The advantages of the second embodiment over prior art methods and also the 

25 first embodiment are that the transient detection model is more general and provides good 
results for various transients, not just short transients. Also, the time-warping of the signal 
parts between transients is based on &e knowledge of the properties of sound peiception, 
such as pitch perception and temporal masking effects. Furthermore, the meftod of the 
second embodiment residts in a signiiicantly lower computational complexity. 

30 Both of the methods disclosed herein provide a particularly advantageous 

method for coding audio and video signals. In particular, restricting fbe transient locations 
simplifies the analysis procedure in an audio coder (involving transient, sinusoidal and noise 
models) significantly. Also, the side information associated with the corresponding 
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segmentatioix is reduced because of tiie restricted segmentalioii often used in the two 
embodiments described. 

Furthermoret the introduced difference in transient locations is not of 
perceptual importance. 
5 The method could be implemented in devices for stering, transmitting, 

receiving, or reproducing audio and/or video^ e.g. solid state audio devices. Figure 10 shows 
an audio coder 10 and an audio decoder 12 \^ch receive an audio signal (A) for coding and 
a coded signal (C) for decoding respectively, with the decoder 12 outputting die audio signal 
A, In particular, the audio coder may be included in a transmitting or recording device, 

10 fiixtfaer conqjrising a source or receiver for obtaining the audio signal and an output unit for 
transmitting/ou^utting tiie coded signal to a transmission medium or a storage medium (e.g. 
a sold state memory). 

It should be noted tiiat the above-mentioned embodiments illustrate rather tfian 
limit the invention, and that fliose skilled m the art will be able to design many alternative 

1 5 embodiments wi&out departing from the scope of the appended claims. In the claims, any 
reference signs placed between parentheses shall not be construed as limiting the claim. The 
. word ^compxising' does not exclude ihe presence of other elements or steps than those listed 
in a claim. The invention can be implemented by means of hardware comprising several 
distinct elements, and by means of a suitably programmed computer. In a device claim 

20 enumerating several means, several of these means can be embodied by one and the same 
item of hardware. The mere fact that certain measures are recited in mutually different 
dependent claims does not indicate that a combination of these measures cannot be used to 
advantage. 

In summary, an improved representation of transients in audio signals 
25 comprises modifying transient locations in such a way tiiat a transient can occur only at a 
beginning of a sinusoidal segment. The modification procedure comprises the steps: 
. detecting abeginning and an end of atransient using an energy-based c^roach with two 
sliding rectangular windows; 

- moving samples between the beginning and the end of die transient to the locations 
30 specified by tiie segmentation used; and 

time^waiping the signal parts in between the transients in order to fill the intervals between 
the THfiHifi^ transients* 
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I . A method of coding an input signal, the method comprising: 

^ estimating a location of at least one transient In a time segment of the input signal; 
the method being chaiacterized by 

- modi^jdng the location of the transient so that the transient occurs at a specified location on 
5 a predetenniiied time scale to obtain a modified signal; and 
modeling die modified signal. 

2. A method of coding as claimed in claim 1, in which each transient is relocated 
to a nearest specified location of a pluraUty of possible locations on the predetennined 

10 timescale. 

3. A method of coding as claimed in claim 1, in which the specified locations on 
the predetermined time scale are defined by integer multiples of a piedetemiined minimum 
time segment size. 

15 

4. A method ofcoding as claimed in claim 3, in which the predetennined 
minimum time segment size has a length in the range of approximately 1 millisecond (ms) to 
approximately 9 ms. 

^® ^* A method ofcodlng as claimed in claim 1, in which the modeling uses 

sinusoids to represent the modified input signal. 

^' A method ofcoding as claimed in clahn I, in which a restricted time 

segmentation is also applied to tonal and/or noise components of the input signal. 



25 



^- A method of coding as claimed in 1, in which the estimation of the 

location of transients is earned out using an energy-based approach. 
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8. A method of coding as claimed in claim 7, in which the estimation of the 
location of tra nsie n t s is earned out using two sliding Endows. 

9. A method of coding as claimed in claim 1 , in which the location of transients 
S involves the location of a beginning and an end of each transienL 

10. A method of coding as claimed in claim 1 , in which each located transient is 
moved by a cut and paste method firom its original location to begin at a location on the 
predetemiined time scale. 

10 

11. A method of coding as claimed in claim 10, in ^^ch a remaining section of 
the input signal between two located modified transients is time-warped to fill tiie g^ 
remaining following the relocation. 

15 12. Axxiethodofcod2ngasclaimedinclaimlUin'wMch^tixxi&-warpi5a 
lengthening or a shortening of said remaining section. 

13. Ametfaod of coding as claimed in claim 11» in which the time-warping 
preserves the aiiq)litudes of edge points of the modified signal. 

20 

14. A noethod of coding as claimed in claim 1 1, in which the time-'Waip is carried 
out by interpolation where the change in the fundamental frequency of the remaining section 
is less dian ^proximately 0.3 %. 

25 15. A method of coding as claimed in claim 1 1, in which, w^ere ^e cha^ge in the 

fundamental frequency of fhe remaining section is more than or equal to 0.3%, the remaining 
section is split into a first length immediately afier the modified transient and a second 
lengdi. 

30 16. A method of coding as claimed in claim 15» in which the first length is 

approximately 8 ms to 12 ms. 

17. A method of coding as claimed in claim 14^ in which "wb^ve the interpolation 

is insufficient to fill gap in the zemaining section, and overlap-add procedure is used. 
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18. A method of coding as claimed in claim I, in which the modification of the 

location of the or each transient is perfonned using a transformation into a fiequency domain. 



5 19. 

to: 



Apparatus (10) foi coding signals con^ffises an electronic processor operable 



- estimate the location of one or more transients in a time segment of an audio or video 
signal; 

characterized by the processor being operable to modify the location of the or each transient 

10 sothattheoreachtransicntoccmsataspecifiedlocatianonapr«teteiminedtiinescale.and 
the processor is operable to model the modified input signal. 



20. 



Apparatus (10) as claimed in daim 19, w^iich is an audio device. 




hK).24E P. 27 




21 



03.052001 



ABSTRACT: 



An improved represexxtation of transients in audio signals coniprises modifying 
transieot locations in such a way that a transient can occur only at a beginning of a sinusoidal 
segm^ The modification procedure comprises Ae steps: 

- detecting a beginning and an end of atransient using an energy-based qiproacb vnfh two 
5 sliding rectangular windows; 

- moving samples between the beginning and the end of the transient to the locatiotis 
specified by the segmentation used; and 

- time-warping the signal parts in between Ac transients in order to fill the intervals between 
the modified transients. 

10 

(Figure 9) 
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